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1.0 INTRODUCTION 

This design is for a stand alone NS32CG16 execution vehi- 
cle. The design includes the NS32081 Floating Point Unit, 
DP8511 BitBIt Processing Unit, NS32202 Interrupt Control 
Unit and SCN2681 Dual channel serial interface. MONCG, a 
modified version of M0N16, is supported in this design for 
interface to DBG32 debug utilities. The NS32202 timers 
may be used to time program execution. 

1.1 Specification 

1 . 256 Kbytes of Static RAM (32K x 8). 

2. 4 sockets of EPROM, capable of 27C256 or 270512 
EPROM. 

3. Serial 1/0—2 ports RS-232, configured for 
MONCG/DBG16debug. 

4. Circuits required to interface the DP851 1 with the 
NS320G16. The interface utilizes an 8-bit counter to 
control the DP851 1 allowing a maximum of a 256 word 
wide pattern to be BitBlted. 

5. Memory and I/O map controlled with PALs to allow 
changes. 

6. Simple LED indicators to show board status. 

7. NS32081D-15 installed for floating point tests. 

8. NS32202-10 interface to verify interrupts. 

9. 3 push buttons, INT into NS32202, NMI and RESET to 
NS32CG16. 

10. NS32CG16V-15 installed. 

11. MONCG (a new mon16) installed in EPROMs. 



12. HOLD/input available for testing. 

13. "SPLICE" Control Signal interface. 

14. PROM Shadow feature. 

15. Operates at 15 MHz. 

1.2 BPU Control Circuit Programming Information 

On the board, the addresses of the control registers are 

PAL programmable, but default to the following: 



Address 


Description 


OxFFOOOO 
0xFF0020 
0XFF0022 
0XFF0040 
OxFFOOaO 


Duart(SCN2861) 

BPU Control Register on DP851 1 

BPU Function select register on DP851 1 

BPU Mask Register 

BPU Counter Register 



The BPU Control register and Function select register are 
as described in the DP8511 BitBIt Processing Unit Data 
sheet, and are 1 3 bits and 4 bits wide respectively. 
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When programming the control logic, the following se- 
quence should be used. 

1. Write BPU Counter 

2. Write BPU Mask Register 

3. Write BPU Control Register 

4. Write BPU Function Select Register 

Note that the BPU Enable in the Mask register must be 
turned on prior to writing the BPU Control register. 
When the BILBLT operation is complete, it is recommended 
that the BPU be turned off by writing a zero to the BPU 
Mask register. This is not required, however. 

2.0 NS32CG16/DP8511 INTERFACE 

The following sections describe the logic required to inter- 
face the NS32CG16 to the DP8511 BitBIt Processing Unit 
(BPU). The NS32CG16 facilitates the interface require- 
ments by supporting a special signal, BPU, and an BitBIt 
instruction, EXTBLT. 

The schematic and PAL equations in this document de- 
scribe an implementation that supports BitBIt operations in 
4 directions, left to right or right to left while moving top to 
bottom or bottom to top. Most typical printer applications 
require only left to right BitBlting while moving top to bottom. 

2.1 Features 

Following are the features of described interface: 

• 32-bit CPU. 

• 16 megabyte address range for BitBIt operations. 

• BitBIt operations in all 4 directions. 

• 16 logical BitBIt functions. 
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• Operates with conventional DRAMs. 

• High-speed barrel shift of data. 

• Hardware masking of data. 

• Bus bandwidth limits on external BitBlting. 
2.2 Image Memory Configuration 

To obtain optimum performance in graphics applications the 
Image Memory must be organized to support the Series 
32000 byte and bit manipulating instructions. Figure 2. 1 , be- 
low, illustrates the memory organization at the byte (8 bits), 
word (16 bits) and double-word (32 bits) level. 
In the rest of this document hexadecimal numbers will be 
represented with a leading Ox. e.g., hexadecimal laSa will 
appear in this document as 0x1 a5a. 
Figure 2. 1 represents one scan line of a standard 81/2 inch 
by 1 1 inch page, on a 300 Dot Per Inch (DPI) laser printer in 
the portrait orientation (S'/a inches wide, 1 1 inches high). 
There are 3300 such scan lines on each page. The start of 
the second scan line on the page would be at byte offset 
320 decimal, 140 hex. Since the first scan line is at the top 
of the page, successive scan lines proceed down the page. 
All Series 32000 microprocessors have 32-bit internal data 
paths, with a "natural" size of 32 bits, or 4 bytes. Memory 
accesses are always Least Significant Byte (LSB) to Most 
Significant Byte (MSB). Referring to Figure 2.1, writing a 
byte of 0xA5 to address zero would result in address zero 
containing 0xA5. Writing a word of 0xA55A to address zero 
would result in address zero containing 0x5A, and address 1 
containing 0xA5. Writing a doubleword of 0xFFA55A00 to 
address zero would result in address zero containing 0x00, 
address 1 containing 0x5A, address 2 containing 0xA5, and 
3 containing OxFF. 

The Series 32000 microprocessors do not have an align- 
ment restriction in that data of byte, word or doubleword 
size need not reside on an even memory address. The Bus 
Interface Unit, internal to all Series 32000 microprocessors, 
request multiple bus transfers as required, aligning the data 
automatically. 

The bit offset is equally consistent. Bit ordering is always 
least significant to most significant bit. In Figure 2. 1 , bit zero 



of byte zero would be the first pixel imaged on the page. Bit 
one would be the next pixel, bit two the next, and so on. Bit 
2549 would be the last pixel imaged on the page in the 
horizontal direction, since 81/2 inches * 300 DPI yields a 
width of 2,550 dots, or pixels. Bit 2549 is contained within 
byte 318, at bit position 5. Both Bit Addressing (e.g., SBITD 
2549, page ) and Byte Addressing with a byte address and a 
bit offset (e.g., SBITD 5,page + 318) are available in Series 
32000. Figure 2.2 is an expansion of the first three bytes of 
the scan line, showing the bit addressing, as it would appear 
on the page printer or graphics screen. 
To clarify these conventions further, the following example 
illustrates how a line 1 dot high and 10 dots wide is drawn. 
This line appears on scan line one, starting at the ninth 
pixel, or bit position 8. This will result in OxFF in address 
one, and 0x03 in address two. This is referred to as the 
horizontal direction. 

The width of the memory for an image is referred to as the 
image warp. The warp of the page printer image in the previ- 
ous example is 320 decimal (140 hex) bytes, or 2560 bits. 
Note that the image width is actually 2550 on this sample 
page printer at 300 DPI, since S'/a inches * 300 DPI yields a 
width of 2,550 dots. The width is rounded up to 2560 bits 
(320 bytes) to make memory addressing simpler in a typical 
hardware design. 

When the warp is known, perpendicular (or vertical, in this 
case) lines can be drawn. A vertical line 10 dots high and 1 
dot wide starting at the first line, ninth pixel, with a warp of 
320 (140 hex) and a base address of would result in ad- 
dresses 1 (1 hex), 321 (141 hex), 641 (281 hex) . . . 2881 
(B41 hex) each containing 01 hex. 

To summarize, for portrait applications, the "top left" pixel is 
bit 0. The "top right" pixel is bit 2,549. The "bottom left" 
pixel is bit 8,445,440. The "bottom right" pixel is bit 
8,447,989. To calculate x,y bit positions on the page, the 
formula: 

Bit offset = (k * 2560) + x 
may be used, where/ is the scan line number ranging from 
to 3299, for the sample 81/2 by 1 1 inch page, and x is the 
pixel displacement across the page from the left hand edge. 
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FIGURE 2.1. Memory Organization 
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FIGURE 2.2. Bit Order within a Byte 



2.3 BitBIt Operation 

It is assumed that the reader is familiar with the fundamen- 
tals of BitBIt operations. More information on the BitBIt algo- 
rithm may be found in the DP851 1 BitBIt Processing Unit 
(BPU) data sheet. 

In this circuit the BPU is connected to an NS32CG16 CPU. 
During the external BitBIt instruction execution, the CPU 
generates an optional source word pre-read, source word 
read, destination word read and destination word write, 
while asserting the BPU signal. The BPU interface circuit 
monitors the BPU signal and controls the BPU and asserts 
left and right masks when appropriate. The interface has an 
8-bit counter, enabling up to 256 word wide blocks to be 
transferred. 

Prior to executing the EXTBLT instruction the software must 
first load the mask register with left and right beginning and 
ending mask bits, pre-read enable, BPU enable, and then 
load the character width count register with the width (in 
words) of the BitBIt operation. Then the software loads the 
BPU control register with the Barrei Swap bit, the stiift 
amount, the ieft masl<, and the rigiit masl< values and the 
BPU Function select register. The software must then load 
the CPU registers with all the information required by the 



EXTBLT instruction, for more details, refer to the 
NS32CG16 Programmer's Reference Manual. The EXTBLT 
instruction will then cause the BPU to perform the required 
BitBIt instruction for the width of the character by the height 
of the character. 

The mask register, the character width counter register, the 
BPU control register and the BPU Function select register 
are mapped as output devices in the CPU's address space. 
The address of these registers are implementation depen- 
dent since the EXTBLT instruction does not reference them 
directly. The BPU is selected by the BPU signal which is 
asserted during the execution of the EXTBLT instruction. 
The BPU control register is shown in Figure 2.3a. This regis- 
ter is on the BPU device. Note: the left mask is asserted at 
the beginning of a line on the left side of the page, the right 
mask is asserted at the end of a line on the right side of the 
page. This satisfies the bit-ordering of a Series 32000 bit 
zero of byte zero is the first imaged pixel. 
The programming of the BPU control register changes de- 
pending on the direction of the BitBIt, either left to right or 
right to left. The mask enable bits for the left and right 
masks and the BIS bit remain the same for top to bottom or 
bottom to top. 
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FIGURE 2.3a. BPU Control Register 



Description of control register bits. 



RM 

0000 
0001 
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1 000 
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Barrel shift quantity. Causes the 32-bit barrel 
shifter to rotate 0-15 bits MSB to LSB. 



Left to right, shift > zero. 
Left to right, shift = zero. 
Right to left, any shift. 




FIGURE 2.3b. BPU Function Select Register 



Description of Function Seiect register bits. 
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The structure of the mask register is shown in Figure 2.4. It 
is an 8-bit register, located on the least significant byte of 
the data bus. Following is a description of valid mask enable 
sequences that can be programmed into the masl< register. 



A left mask in the following documentation means a mask 
that preserves the least significant bits, a right mask pre- 
serves the most significant bits. The BPU applies the left 
and right masks on the write to the destination. 



BRME BLME ERME ELME 



Left to right, both masks, width > 1 

Right to left, both masks, width > 1 

Shift of zero, no masks, width = n * 18 bits 

Any direction, both masks, width = 1 

Any direction, left mask, width = 1 

Any direction, right mask, width = 1 



ENBPU Active High. When low, resets the BPU and the 
interface circuitry state machine. 

PRERD Active High. When asserted, forces the interface 
circuitry to perform an additional read at the be- 
ginning of each BitBIt source read. This is termed 
source pre-read, and is required at the beginning 
of each line if the width of the first destination data 
write is greater than the amount of valid data con- 
tained in the first source read. 



Description of interface signals follows. 
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FIGURE 2.4. Mask Register 

The structure of the counter register is shown in Figure 2.5. 
It is an 8-bit register, located on the least significant byte of 
the data bus. The value written to this register is the two's 
complement of the width in words of the destination data to 
be BitBlted. Each word is 16 bits in width, thus for a font 
character 32 pixels wide, the counter register would be 
loaded with the value OxFE. The two's complement of the 
width can easily be obtained via the NS32CG16's NEGi 
gen, gen instruction. 
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FIGURE 2.5. Counter Register 

2.4 Interface Circuit Description 

The EXTBLT instruction causes the CPU to output source 
and destination addresses, the BPU signal and read/write 
bus cycles. The BPU interface circuit monitors these signals 
and issues control signals to the BPU. The BPU must be 
interfaced to the data bus on the system side of the trans- 
ceivers. During BPU data transfer cycles, a signal is gener- 
ated by the BPU interface circuit to disable the data bus 
transceivers. The CPU must be disabled from driving the 
system data bus during an EXTBLT write cycle because the 
BPU will be driving the BitBIt result onto the bus. 
A detailed description of the BPU interface circuit and all the 
external signals follows. 



Signal 


Description 


BPU 


From NS32CG16, BPU cycle active. 


ST1 -ST3 


From NS32CG16, status signals 
describing the type of bus cycle. 


BD00-BD15 


Buffered system data bus. 




From NS32CG16, reset signal. 


RESET 


TSO 


From NS32CG16, signifies data 
portion of bus cycle. 


CTTL 


From NS32CG16, TTL clock output. 


DDIN 


From NS32CG16, signifies direction 
of data transfer. 


CSYNC 


Similar timing to 1 SO, but is deasserted 
one T-state early, at the end of T3. 




Signal to turn off the data 

bus transceivers. The CPU data bus must 

be isolated from the BPU-memory data 

bus during the execution of 

the EXBLT instruction. 


BPUCYC 


The following signals are decoded 
address lines, gated with TSO 


BPUMSKWR 
BPUCNTWR 
BPUCTLWR 


Write the 8-bit mask register. 
Write the 8-bit counter register. 
Write the BPU control register. 



Refer to the BPU interface schematic diagram for the rest of 
this section. Functional description of the interface circuit 
follows. 

2C and 6C decode status signals from the CPU along with 
the BPU signal to produce the BPUCYC signal. BPUCYC 
indicates that the current bus cycle is an EXTBLT data 
transfer cycle. The status signals must be used to qualify 
the BPU signal because the CPU can assert BPU and then 
perform instruction pre-reads to fill it's internal pre-fetch 
queue. The status signals are decoded to uniquely detect a 
data transfer cycle, as against any other type of bus cycle. 
3F is the mas/r register. It is a 6-bit write only latch with 
reset. On power-on the NS32CG16 RESET output signal 
will cause all the outputs of the latch to be cleared to ze- 
roes. Writing to the latch simply involves a move of a byte to 
the BPUMSKWR address. Once programmed the masl< reg- 
ister will remain unchanged until it is written to again or 
RESET is asserted. 



3D is the counter register. It is an 8-bit write only latch and 
counter. Writing to the latch simply involves a move of a 
byte to the BPUCNTWR address. Once programmed the 
counter register will remain unchanged until it is written to 
again. The synchronous binary up counter portion of 3D is 
loaded when 4D asserts the CTR_LOAD output. CTR_ 
LOAD is asserted when the BPUMSKWR signal is active, 
which implies that the programmer must program the coun- 
ter register prior to the mas/f register. The counter in 3D 
enables characters up to 256 words wide to be BitBlted with 
the BPU. Note again that the value programmed into the 
counter register must be the two's complement of the width 
in words of the character. 

BPU bus cycles consist of an optional source pre-read (only 
on the first word of a new line), source read, destination 
read and destination write. 4D detects destination writes 
and only increments or re-loads 3D on completion of the 
write. The PRERD bit in the masl< register must be set by 
the programmer if pre-reads are required. This bit causes 
the state machine in 4D to perform an extra source read 
(pre-read), at the beginning of each BitBIt line. 



CTR LOAD is also asserted on the last BPU write cycle at 

the end of each BitBIt line, causing the value in the counter 
register, 3D, to be re-loaded into 8-bit counter in preparation 
for the next BitBIt line. The TSO signal from the NS32CG16 
is the clock, rising (positive-going) edge sensitive, for 4D 
and 3D. During a BitBIt of more than one word in width, 4D 

also asserts the CTR ENBP signal to enable 3D to count 

on the next rising edge of TSO. 3D counts up until it reaches 
a count of 255, at which point the RCO of 3D is asserted. 4D 
treats the assertion of the RCO signal as an indication that 
the next BPU bus cycle (source read, destination read and 
destination write) is at the end of BitBIt line. 4D then asserts 
the MASK_SEL and MASK_ENB signals, and causes 3D 
to re-load from the counter register as explained above. The 
MASK_SEL and MASKHENB signals control the multi- 
plexer in 4F selecting the appropriate masks control signals 
that connect to the BPU, refer to the description of the mas/r 
register in Section 2.3. 

Refer to the attached timing diagrams for detailed BPU con- 
trol signal timing. Figure 2.6 depicts a complete BPU cycle 
with pre-read. Only the data path control signals are shown. 
Figure 2.7 depicts a write bus cycle to the mas/r register. 
Figure 2.8 depicts the assertion of the appropriate masks 
during the first, a middle word and then the last word of a 
BPU BitBIt line. 



4D generates a signal, DESTCYG, that indicates the type of 
bus access that the EXTBLT instruction is performing. 
When high, it indicates a source pre-read or read, when low, 
it indicates either a destination read or destination write. 
DESTCYG connects to 4F which controls most functions of 
the BPU. 

4F controls the BPU data paths, FIFO operation, and con- 
trol registers. The programmer must load the BPU control 
register prior to executing the EXTBLT instruction, refer to 
Section 2.3 for a detailed description of the bits. The pro- 
grammer accesses the 13-bit, write only control register by 
writing to the BPUCTLWR address. 2C generates the GRE 
signal which writes the 13-bit data into the BPU control reg- 
ister. 

2G also generates the FSE signal which writes 4-bit data 
into the BPU Function select register. All references to reg- 
isters within the BPU use the same terminology as in the 
DP851 1 data sheet. 2D is clocked from inverted CTTL, its 
function is to delay the assertion of the FRD, FWR, RME, 



LME, BSE and DLE signals to satisfy the setup and hold 
time requirements of the DP851 1 . The B DLE signal caus- 
es the BPU to latch the data on the data bus into the DIL- 
MASTER register. Both the source and destination read 
data is temporarily stored in this register during the EXTBLT 

instruction execution. The B BSE signal causes the BPU 

to latch the data from the DIL-MASTER register into the 
DIL-SOURCE register during a source pre-read or read bus 
cycle. 

When B BSE is not asserted, the data contained in the 

DIL-MASTER register will be latched into the DIL-DEST reg- 
ister. The DIL-DEST register contains the read destination 
data. 

The barrel rotator performs the rotation, to 15 bits. The 
result is transferred to a multiplexer at the input of the 16 
word FIFO. 4F generates the FWR signal to write this value 
into FIFO location 0. 4F delays the assertion of FRD two 
clocks, as required by the BPU. The FRD signal transfers 
the data stored in FIFO location to a holding latch, then 
through another multiplexer (always in BitBIt mode, B/L low) 
to the source input of the BitBIt Logic Unit, BLU. 

The B BSE is deasserted during the destination source 

read, causing the DIL-MASTER data to be loaded into the 
DIL-DEST register, through to the destination input of the 
BLU. 

The BLU performs the required logical operation, based in 
the 4-bit Function Select code programmed into the BPU 
control register. The left and right masks are then applied to 
the result and finally the destination read data is ored with 
this result. This method of masking is called destination 
masking and is different to the NS32GG16's software BitBIt 
instructions which perform source masking. The final result 
is the same regardless of the method used. 
The result from the BLU is now available for writing back to 
memory. The NS32CG16 performs a write bus cycle, 6G 
generates DOE which enables the BPU output buffers, the 
result appears on DQ00-DQ15 and is written to memory. 
The entire BitBIt operation takes 12 clock cycles to perform 
a source read, destination read and destination write. The 
whole BitBIt cycle can then repeat for the next word of the 
BitBIt line. Note that interrupts if enabled and pending will 
be serviced at the end of each BPU write cycle. The preread 
(optional), read source, read destination and write destina- 
tion cycle is indivisible at the interrupt level. The NS32CG16 
will deassert the BPU signal prior to fetching the vector from 
the interrupt source. The BPU signal remains deasserted 
during the entire interrupt service routine and only on return 
from interrupt and resumption of the EXTBLT instruction will 
the NS32GG16 again assert the BPU signal. 
The DP851 1 BPU has many functions that are not used by 
the NS32GG16 during the EXTBLT instruction execution. 
Figure 2.9 depicts the functional blocks inside the BPU that 
are used during an EXTBLT instruction. Refer to the 
DP851 1 Data Sheet for the complete BPU model. 
Following are two example programs that perform tests of 
the EXTBLT instruction and interface. The first program per- 
forms a left to right, top to bottom test, the second performs 
a right to left, bottom to top test. The programs check the 
result of the EXTBLT instruction by comparing the output 
with that from the BBFOR instruction. If the results are the 
same, the shift amount is incremented and the test is per- 
formed again. The programs test the EXTBLT for shifts of 
zero through to 1 5. 



#Program extblt.s 
#Program to test 
.globl 



#BitBlt 

test : 

loopl : 

loop: 



test pro 

movqd 

movd 

movqd 

addr 

movqd 

addr 

movqd 

movmpd 

addr 

movd 

movd 

movb 

movqd 

movqd 

ompqd 

beq 

movb 

addqd 

movqd 

movqd 

negb 

movb 

movw 

movw 

addr 

addr 

movd 

movqd 

addd 

movd 



the extblt instruction, 

test ,dest 

gram 

0, shift 

$108, height 

1, width 

dest ,rO 

4,rl 

1024, r2 

0,r3 

OxffOOOO,rO 

width, r2 

shlft,rl 

$0x0e,r3 

4,r7 

0,r6 

0,rl 



noinc 

$0x6e,r3 

l,r2 

2,r7 

-2,r6 

r2.0x60(r0) 

r3,0x40(r0) 

bputab[rl:w] ,0x20(r0) 

$0x7, 0x22 (rO) 

ohara-2,r0 

dest ,rl 

height ,r3 

2,r4 

r2,r2 

r2,r5 



left to right, top to bottom 



# start with shift of zero 

# height in lines 

# start with width of 1 word 
#point to destination block 
#lncrement value 

#number of patterns to write 

#pattern to write 

#fill area 

#point to the control base 

# get current width 

# get current shift value 

# assume shift is zero. 

# set destination warp 

# set source warp 

# is shift zero? 

# yes, all is ok, else 

# set up left and right masks 

# one extra word of destination 

# set destination warp 

# set source warp 

# set up counter 
#set up mask register 
#set up BPU register 

#set up BPU register, OR function 
#point to source character 
#point to destination 

# get current height 
#increment value 

# width = r4 • r2 



cmpqb 
extblt 



1,$2 



#do pre read 



movqb 


$0,0x40 + OxffOOOO 


addr 


chara,rO 


addr 


destl,rl 


movd 


shift,r2 


movd 


height, r3 


movd 


$0xffff,r4 


movd 


$0xffff,r5 


movqd 


2,r6 


movqd 


4,r7 


movd 


width, tos 


cmpb 


r2,$0 


bbfor 




cmpqd 


O,tos 


addr 


512 ,rO 


addr 


destl.rl 



#clear mask register, disable 

#bpu, reset logic 

#point to source char 

#point to destination 

#shift value wanted 

#height in lines 

#first mask 

#second mask 

#source warp 

#dest warp 

#width in words 



#unstack 

#number of doubles to compare 

#destl 



addr 


dest,r2 


cmpsd 




bne 


bad 


addqd 


1, width 


movd 


$54*4, rO 


dlvd 


width, rO 


movd 


rO, height 


cmpqd 


l,rO 


bit 


loop 


addqd 


1, shift 


ompd 


$16, shift 


bne 


loopl 



#dest 

#oompare those strings 



# next width 

# get max lines 

# divide to get current lines 

# and store it 

# is it OK? 

# next shift 

# done yet? 

# no, back for more 



bad: 


bpt 

.data 

.data 




bputab : 


.word 


OxlOOf 




.word 


OxOflO 




.word 


0x0e21 




.word 


0x0d32 




.word 


0x0c43 




.word 


0x0b54 




.word 


0x0a65 




.word 


0x0976 




.word 


0x0887 




.word 


0x0798 




.word 


0x06a9 




.word 


OxOSba 




.word 


Ox04cb 




.word 


0x03dc 




.word 


Ox02ed 




.word 


OxOlfe 


width: 


.double 





shift: 


.double 





count : 


.double 





height : 


.double 







.coram 


dest,2048 




.coram 


destl,2048 



# shift 


of zero, set masks 


# shift 


of one 


# shift 


of two 


# shift 


of three 


# 4 




# 5 




# 6 




# 7 




# 8 




# 9 




# 10 




# 11 




# 12 




# 13 




# 14 




# 15 





8c BIS 



#Program 


extblt. 


s 




#Program 


to test 


the extblt instruction 


right to left, bottom to top 




.globl 


test.dest 




#BitBlt 


test program 




test: 


movqd 


0, shift 


# start with shift of zero 


loopl : 


movd 


$108, height 


# height in lines 




movqd 


1, width 


# start with width of 1 word 


loop: 


addr 


dest.rO 


#point to destination block 




movqd 


4,rl 


#increment value 




addr 


1024, r2 


#number of patterns to write 




movqd 


0,r3 


#pattern to write 




movmpd 




#fill area 




addr 


OxffOOOO.rO 


#point to the control base 




movd 


width, r2 


# get current width 




movd 


shift,rl 


# get current shift value 




movb 


$0x0e,r3 


# assume shift is zero 




movqd 


-4,r7 


# set destination warp 




movqd 


0,r6 


# set source warp 




ompqd 


0,rl 


# is shift zero? 




beq 


noinc 


# yes, all is ok, else 




movb 


$0x6e,r3 


# set up left and right masks 




addqd 


l,r2 


# one extra word of destination 




movqd 


-2,r7 


# set destination warp 




movqd 


2,r6 


# set source warp 


noino : 


negb 


r2,0x60(r0) 


# set up counter 




movb 


r3,0x40(r0) 


#set up mask register 




movw 


bputab[rl:w],0x20(r0) 


#set up BPU register 




movw 


$0x7, 0x22 (rO) 


#set up BPU register, OR function 




addr 


chara + 220,r0 


#point to source character 




addr 


dest + 1024,rl 


#point to destination 




addd 


r6,rl 


#pre-increment destination for sh>0 




movd 


height, r3 


# get current height 




movqd 


-2,r4 


#increment value 




muld 


r4,r2 


# width = r4 * r2 




movd 


r2,r5 






ompqb 


1,«2 


#do pre read 




extblt 








movqb 


$0,0x40 + OxffOOOO 


#clear mask register, disable 
#bpu, reset logic 




addr 


ohara + 218,r0 


#point to source char 




addr 


destl + 1024,rl 


#point to destination 




movd 


shift,r2 


#shift value wanted 




movd 


height, r3 


#height in lines 




movd 


$0xffff,r4 


#first mask 




movd 


$0xffff,r5 


#second mask 




movqd 


-2,r6 


#source warp 




movqd 


-4,r7 


#dest warp 




movd 


width, tos 


#width in words 




cmpb 


r2,$0 






bbor 


-da 






ompqd 


O,tos 


#unstack 




addr 


512 ,rO 


#number of doubles to compare 



addr 


destl.rl 


addr 


dest,r2 


cmpsd 




bne 


bad 


addqd 


1, width 


movd 


$54*4, rO 


dlvd 


width, rO 


movd 


rO, height 


cmpqd 


l,rO 


bit 


loop 


addqd 


1, shift 


ompd 


$16, shift 


bne 


loopl 



#destl 

#dest 

#oompare those strings 



# next width 

# get max lines 

# divide to get current lines 

# and store it 

# is it OK? 

# next shift 

# done yet? 

# no, back for more 



bad: 


bpt 

.data 

.data 




bputab : 


.word 


OxlOOf 




.word 


OxlflO 




.word 


0xle21 




.word 


0xld32 




.word 


Oxlc43 




.word 


Oxlb54 




.word 


0xla65 




.word 


0x1976 




.word 


0x1887 




.word 


0x1798 




.word 


0xl6a9 




.word 


OxlSba 




.word 


Oxl4cb 




.word 


0xl3dc 




.word 


Oxl2ed 




.word 


Oxllfe 


width: 


.double 





shift: 


.double 





count : 


.double 





height : 


.double 







.coram 


dest,2048 




.coram 


destl,2048 



# shift of zero, 

# shift of one 

# shift of two 

# shift of three 



set masks & BIS 



# 


4 


# 


5 


# 


6 


# 


7 


# 


8 


# 


9 


# 


10 


# 


11 


# 


12 


# 


13 


# 


14 


# 


15 



10 



CHL 
T1 


T1 


T2 

n_ 


T3 


T4 


T1 T2 

-|_n_ 


T3 


T4 


T1 T2 

-|_n_ 


T3 


T4 


T1 T2 

-|_n_ 


T3 
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FIGURE 2.9. BPU Model 



Figure ^.9 is a block diagram of the functional model of the 
BPU as used with the NS32CG16. All the data paths in the 
figure are 16 bits wide. The barrel shifter is actually a rotator 
that rotates from right to left, i.e., least significant bits, DQO 
to DQ1 5, are shifted towards the most significant bits. Re- 
ferring to Figure 2.9 the data paths to and from the barrel 
shifter are A, B and C. Path A is the current source read 
data and is loaded into the 16 LSB's of the barrel shifter, 
path B is the source read data from the previous word (if in 
the middle of a BitBIt block) and is loaded into the 



16 MSB's. The data is rotated left by the appropriated num- 
ber of bits specified by the SN inputs and the resulting 16 
MSB's are output via data path C to the BitBIt Logic Unit 
(BLU). Path D contains the 16 bits from the destination read 
data, and connects to the BLU destination data input. The 
BLU performs the required function and asserts the appro- 
priate masks and the result is then made available at the 
output of the BLU for writing back to the destination BitBIt 
address. 
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Name 


MASK.PLD 


Date 


03/06/89 


Revision 


IB; 


Designer 


Bill Fox 


Company 


NSC; 


Assembly 


APP Note 


Locati on 


U5; 


Device 


pl6r4; 


Part no 


OdSl; 



/* V 

/* BPUMASK: DP8511 MASK AND MASK COUNTER CONTROL */ 

/* V 

/******************************************************************/ 

/* Allowable Target Device Types: PAL16R4A */ 



/** 


Inputs 


"/ 








Pin 


1 


= 


elk 


/* 


tso from cgl6 */ 


Pin 


2 


= 


ncO 


/* 


*/ 


Pin 


3 


= 


!mrco 


/* 


counter ripple carry out */ 


Pin 


4 


= 


ncl 


/* 


'/ 


Pin 


5 


= 


!ddin 


/* 


data direction in */ 


Pin 


6 


= 


Ibpumskwr 


/* 


bpu mask write strobe */ 


Pin 


7 


= 


b_prerd 


/* 


bpu preread */ 


Pin 


8 


= 


b_enbpu 


/* 


enable BPU */ 


Pin 


9 


= 


Ibpucyc 


/* 


bpu cycle V 


Pin 


11 


= 


!oe 


/• 


output enable, always gnd */ 



/** Outputs •*/ 



Pin 


12 


= 


!ctr_enbp 


Pin 


13 


= 


ictrjoad 


Pin 


14 


= 


Idestcyc 


Pin 


15 


= 


Ibmsk 


Pin 


16 


= 


!read_cnt 


Pin 


17 


= 


nc2 


Pin 


18 


= 


!mask_enb 


Pin 


19 


= 


!emask_sel 



/* counter enable p */ 

/* counter load */ 

/* destination cycle indicator */ 

/* beginning mask */ 

/* count of readsq */ 

/* */ 

/* mask enable */ 

/* ending mask select */ 



/** Declarations and Intermediate Variable Definitions **/ 

bpuread = bpucyc & ddin; 
bpywrite= bpucyc & !ddin; 

field bpuseq = [destcyc, read_cnt] ; 

$define sourceO 'b'OO 
$define sourcel 'b'Ol 
Sdefine dst 'b'lO 
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/** Logic Equations **/ 

sequence bpuseq { 

present sourceO if !b_enbpu 

next sourceO; 

if b_enbpu & bpuread & b_prerd 
next sourcel ; 

if b_enbpu & bpuread S ibjprerd 
next dst; 

defaul t 

next sourceO; 

present sourcel if !b_enbpu 

next sourceO; 

if b_enbpu & bpuread 
next dst; 

defaul t 

next sourcel; 

present dst if !b_enbpu 

next sourceO; 

if b_enbpu & bpuwrite & end_mask 
next sourceO; 

if b_enbpu & bpuwrite & !end_mask 
next sourcel; 

defaul t 

next dst : 

} 

bmsk.d - b_enbpu & ( 

!bpumskwr & bpuwrite & mrco /* bpu write cycle */ 

# !bpumskwr & Ibpuwrite & bmsk /* hold data */ 

) 

# bpumskwr; /* load initial mask */ 

ctr_enbp= bpuwrite: 

ctr_load = bpuwrite & mrco 
if bpumskwr; 

mask_enb = bmsk 
# mcro; 

emask_sel= mrco: 



TL/EE/10085-14 



15 



/** Logic Equations **/ 
b_fwr.d = bpusrd & b_dle: 
qO.d = b_fwr; 
ql.d = qO: 
b_frd.d = ql; 

b_dle.d = bpudrd & csync 

# bpusrd 8c csync; 

b_bse.d = bpusrd; 

b_ln\e = elme & mask_enb & mask_sel 
t blme & mask_enb & !mask_sel ; 

b_rme - erme & mask_enb & mask_seT 

# brme & mask_enb & !iiiask_sel ; 



TL/EE/10085-15 



16 



Name 


BCTL; 


Partno 


; 


Date 


03/06/89; 


Revision 


IB; 


Designer 


George Scolaro; 


Conpany 


NSC; 


Assembly 


BPU interface PAL 


Location 




Devi ce 


p20r6; 



/**************************************************■**********************/ 
/* Control pal for DP8511 interface */ 

/* Allowable Target Device Types: PAL20R6A V 



/" 


Inputs 


*V 






Pin 


1 


= 


elk 


/* 


Pin 


2 


= 


!bpucyc 


/* 


Pin 


3 


= 


!ddin 


/* 


Pin 


4 




Icsync 


/• 


Pin 


5 




Idestcyc 


/* 


Pin 


6 




brme 


/* 


Pin 


7 




blme 


/• 


Pin 


8 




erme 


/* 


Pin 


9 


= 


el me 


/* 


Pin 


10 


= 


!mask_sel 


/* 


Pin 


11 


= 


!mask_enb 


/* 


Pin 


13 


= 


!oe 




Pin 


14 


= 


ncO 




Pin 


23 


= 


ncl 





clock */ 

bpu cycle in progress */ 

data direction */ 

1 t state prior to T3 */ 

destination bpu cycle */ 

beginning right mask */ 

beginning left mask */ 

ending right mask */ 

ending left mask */ 

select left/right mask */ 

enable masks */ 



/** Outputs **/ 



Pin 


15 


= 


b„lme 


Pin 


16 


= 


!b_dle 


Pin 


17 


=: 


b_bse 


Pin 


18 


= 


!qO 


Pin 


19 


= 


!ql 


Pin 


20 


= 


b_fwr 


Pin 


21 


= 


b_frd 


Pin 


22 


= 


b_rTne 



/* left mask enable */ 



data input latch ' 
bpu source enable 
internal delay */ 
internal delay */ 
fifo write */ 
fifo read */ 
right mask enable 



/** Declarations and Intermediate Variable Definitions **/ 

bpusrd = bpucyc & !destcyc & ddin; 
bpudrd = bpucyc & destcyc & ddin; 
bpuwrite= bpucyc & !ddin; 
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Name 


DECODE. PLD 


Date 


07/08/87; 


Revision 


lA; 


Designer 


FOX; 


Company 


NSC; 


Assembly 


APP Note; 


Location 


5F; 


Device 


pieiB; 



/******************************************************************/ 



/* DECODE: 



Memory & I/O decode 



/******************************************************************/ 



Allowable Target Device Types: PAL16L8B 



*/ 



/** Inputs **/ 

Pin [1..9] = [a23..16,bal5] ;/* address bus */ 

Pin 10 = gnd ;/* ground */ 

Pin 11 = shdwn ;/* shadow enable */ 



/* ramC enable */ 

/* rami enable */ 

/* ram2 enable V 

/* ram3 enable */ 



/" 


Outputs 


**/ 




Pin 


12 


s 


!ramO 


Pin 


13 


s 


Iraml 


Pin 


14 


= 


!ram2 


Pin 


15 


= 


!ram3 


Pin 


16 


= 


Iramsel 


Pin 


17 


= 


Ipromsel 


Pin 


18 


2 


!iosel 



/* comon ram select for wait state ctl*/ 



/* prom select 

/* io device select 



/** Declarations and Internnediate Variable Definitions **/ 

$definG | # 

/** Logic Equations **/ 

field adr = [a23. . 16,bal5] ; 

romn = adr: [0100000. .013ffff] ; 

ramdcd = adr: [0200000. .Oefffff] | (adr: [0. .03ffff] & Ishdwn); 



ramO = !al7 St !al6 & ramdcd 

rami = !al7 & al6 & ramdcd 

ram2 = al7 8. !al6 & ramdcd 

ram3 = al7 & al6 & ramdcd 

ramsel = ramdcd: 



promsel = romn | (shdwn & adr: [0. .03ffff]) ; 
i osel = adr : [Of f 0000 . . Of f f f f f] ; 
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Name UAIT.PLD; 




Date 07/08/87; 




Revision lA; 




Designer FOX; 




Company NSC; 




Assembly APP note; 




Location 50; 




Device pl6r4; 




/******************************************************************/ 




/* */ 




/* WAIT.PLD: Wait recreation logic V 




/« V 




/* Allowable Target Device Types: PAL16R4A V 




/******************************************************************/ 




/** Inputs **/ 




Pin 1 = cttl 


/* CTTL V 




Pin 2 = tl 


/' Tl indication from CPU */ 




Pin 3 = t2 


/* T2 */ 




Pin 4 = !ddin 


/* data direction */ 




Pin 5 = !dbe 


/* data bus enable */ 




Pin 7 = Icwait 


/* cwait into CG */ 




Pin 8 = !wait2 


/* wait 2 */ 




Pin 9 = Iwaitl 


/* wait 1 */ 




Pin 11 = gnd 


/* ground */ 




/** Outputs *•/ 




Pin 12 = liord 


/* i/o read strobe */ 




Pin [IB.. 14] = ![ctr2,.0] 


/* wait state counter */ 




P1n 17 = !dctr2 


/* peripheral strobe for write */ 




Pin 18 = csync 


/* sync strobe to bpu */ 




Pin 19 = liowr 


/* i/o write strobe */ 




/** Declarations and Intermediate Variable Definitions **/ 




$define j # 




/** Logic Equations **/ 




field cnt = [ctr2. .0]; 




load = (cwait & ctr2) 1 tl; 




count = ctr2; 




ctrO.d = load S Iwaitl 




1 lload & (count $ ctrO) ; 
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ctrl.d = load & !wait2 

I !load & ((count & ctrO) $ Ctrl); 

ctr2.d = load 

I !load & ((count & Ctrl S ctrO) $ctr2); 

dctr2.d = ctr2; 

cycend = Icwait & cnt:[7]; 

!csync = cycend: 

lord = dctr2 & ddin; 

iowr = (ctr2 & dctr2) & !dd1n: 
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32CG16 Functional Block Diagram 
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LIFE SUPPORT POLICY 

NATIONAL'S PRODUCTS ARE NOT AUTHORIZED FOR USE AS CRITICAL COMPONENTS IN LIFE SUPPORT 
DEVICES OR SYSTEMS WITHOUT THE EXPRESS WRITTEN APPROVAL OF THE PRESIDENT OF NATIONAL 
SEMICONDUCTOR CORPORATION. As used herein; 

1. Life support devices or systems are devices or 2. A critical component is any component of a life 



systems which, (a) are intended for surgical implant 
into the body, or (b) support or sustain life, and whose 
failure to perform, when properly used in accordance 
with instructions for use provided in the labeling, can 
be reasonably expected to result in a significant injury 
to the user. 



support device or system whose failure to perform can 
be reasonably expected to cause the failure of the life 
support device or system, or to affect its safety or 
effectiveness. 
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National does not assume any responsibility for use of any circuitry described, no circuit patent licenses are implied and National reserves the right at any time without notice to change said circuitry and specifications. 



